Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General tweaks & actual code generation for int f() { return 5; } #7

Merged
merged 42 commits into from
Feb 6, 2024

Conversation

Jpnock
Copy link
Collaborator

@Jpnock Jpnock commented Jan 30, 2024

What does this PR do?

  • Implements actual code generation for int f() { return 5; } (@Fiwo735 would you be able to update your image to add INT_CONSTANT to the bottom of it :))
  • Deletes a duplicate test int f() { return 5678; }, since we previously had three of these with just different constant numbers,
  • Refactors the compiler entry point, main(), such that it is much clearer what's going on
  • Adds lcov to the Dockerfile, such that Quentin's coverage code will now work inside the container environment
  • Extends the coverage docs to demonstrate how to view the coverage webpage
  • Terminates test.sh early if make fails
  • Improves the output of test.sh to be much more human readable
  • Adds a print method to Node to allow for the parsed AST to be visualised
  • Enables the Address Sanitizer in a discrete way to allow segfaults to be more easily debugged
  • Updates .gitignore to ignore coverage and dependency files
  • Simplifies the grammar file, such that it only contains the required elements to parse int f() { return 5; }. Note that the full grammar file is retained and docs have been added around how it can be used (using the full grammar does massively increase test time).
  • Adds a NodeList helper class and demonstrates its use for places in the grammar where a list of nodes are required
  • Coverage is now invoked via COVERAGE=1 ./test.sh to allow for args to the script to be used for other purposes in the future (such as paths of specific tests to run)

Comments on ASAN

Since ASAN (AddressSanitizer) is turned on, segfaults can easily be diagnosed by reading the log output from stderr (saved into the bin/output directory when you run ./test.sh). This gives a nice stack trace with the problematic line number.

Students will only see this if they navigate to the log file for the test (e.g. the first file in the console output below). In other words, they won't likely see it until they start hunting around the log files when they're having problems getting a test to pass.

image

To prevent ASAN from changing the compiler exit code, I've set ASAN_OPTIONS=exitcode=0 to allow the tests in test.sh to still be attempted even if these logs are produced.

==1859841==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000
==1859841==The signal is caused by a READ memory access.
==1859841==Hint: address points to the zero page.
    #0 in ReturnStatement::emitRISC(std::ostream&, Context&) const src/ast_statements.cpp:14
    #1 in NodeList::emitRISC(std::ostream&, Context&) const include/ast_node.hpp:51
    #2 in FunctionDefinition::emitRISC(std::ostream&, Context&) const src/ast_function_definition.cpp:18
    #3 in main src/compiler.cpp:47

@Jpnock Jpnock marked this pull request as ready for review January 30, 2024 22:30
@Jpnock Jpnock marked this pull request as draft January 30, 2024 22:31
@Jpnock Jpnock marked this pull request as ready for review January 30, 2024 23:52
Copy link
Collaborator

@Fiwo735 Fiwo735 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another great set of changes! The repo is getting more and more professional, especially for a coursework 😄 I especially like providing the two ways of going about the grammar (simplified vs full) as it should give the students an interesting design choice.

I have a few suggestions, which can be summarised as (please make sure I marked all the relevant code snippets):

  • using snake_case for variable names to easily distinguish them from functions and classes also somewhat adhere to Google's style guide
  • keeping the implementations of emitRISC (and the newly introduced print for that matter) in separate .cpp files as that's a general good practice and it would be consistent with other emitRISC implementations (which will especially make a lot of sense in cases where the implementation will require dozens of code lines)
  • making all emitRISC and print functions consistently const override for reminding students about inheritance and having a compiler do a sanity check as well.
  • using initializer list when applicable

I also wanted to discuss some things:

  • Do you think we should keep a reminder in every emitRISC and print function we provide that it's a simple dummy version and they should change it? Or just mention that in the specs (which I think we already do) and keep the code a bit cleaner?
  • I'm not sure I see the point of replacing specific jump_statement (.hpp/.cpp) with statements (.hpp/.cpp) that I assume would include definitions and implementations of all kinds of statements. To me it seems more error-prone and less consistent with how other .cpp/.hpp relate to ast nodes.
  • Given the new simplified grammar, I'd say we actually go a step further and delete some of the "chains" of rules that introduce nothing (... shift_expression -> additive_expression -> multiplicative_expression -> cast_expression ... etc.) in order to make the grammer even less daunting while giving the students complete freedom in how they go about implementing features should they choose the simple grammar as the starting point. In a way that would further cater students who don't want to fully understand the official grammar and just want to make small steps towards the featutes that seem feasible at the time to them.

include/ast_function_definition.hpp Outdated Show resolved Hide resolved
include/ast_function_definition.hpp Outdated Show resolved Hide resolved
include/ast_function_definition.hpp Outdated Show resolved Hide resolved
include/ast_constant.hpp Outdated Show resolved Hide resolved
include/ast_direct_declarator.hpp Outdated Show resolved Hide resolved
src/compiler.cpp Outdated Show resolved Hide resolved
src/compiler.cpp Outdated Show resolved Hide resolved
src/compiler.cpp Outdated Show resolved Hide resolved
src/compiler.cpp Outdated Show resolved Hide resolved
src/compiler.cpp Outdated Show resolved Hide resolved
@Fiwo735
Copy link
Collaborator

Fiwo735 commented Jan 31, 2024

Btw, would you mind removing "[WIP]" from somewhere in the changelog, as I forgot and wanted to avoid creating a new pull request just for that? And also, another pull request from a student happened in parallel to yours and there seems to be overlap - would you mind committing theirs idea (quite small) here, so that we could avoid a merge conflict?

@Jpnock
Copy link
Collaborator Author

Jpnock commented Jan 31, 2024

Yet another great set of changes! The repo is getting more and more professional, especially for a coursework 😄 I especially like providing the two ways of going about the grammar (simplified vs full) as it should give the students an interesting design choice.

Thanks for the review :)

Top four suggestions have been implemented and the [WIP] leftover has been removed. I've also created a CONTRIBUTING.md based on this which summarises the rules we now follow from the Google style guide.

Stderr Warnings

I've omitted the not implemented warnings as you suggested since this is already clear in the docs.

Long grammar chain

As for the long chain in the grammar, I had a think about this and thought it was probably better to keep. I say this for two reasons:

  1. It means there are no actual changes between the simplified grammar and the full grammar, other than deletions. I think this will make adding things back one at a time easier than if they have to start figuring out where changes were made to it and reverting our changes when they add more rules.
  2. I think it's also easier to introduce them to this chain now where they can just click on each non-terminal and follow it directly up, whereas it's much harder to see how this chain works in the full grammar since it no longer fits in the height of your window.

I did move it to the bottom of the grammar file though so it's seen last when going through the grammar.

Jump statement file

I've renamed from ast_statements.(hpp|cpp) to ast_jump_statement.(hpp|cpp). I think it's fine to have all the jump statements in there (e.g. ReturnStatement, ContinueStatement) but yeah you're right that if it's every statement in the AST it's going to be way too many.

@Fiwo735 Fiwo735 merged commit d629d2f into main Feb 6, 2024
2 checks passed
@Fiwo735 Fiwo735 deleted the jpnock/2024-further-tweaks branch February 6, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants